# Multi-GPU Parallel Inference
Llama 3.1 405B Instruct FP8
The NVIDIA Llama 3.1 405B Instruct FP8 model is a quantized version of Meta's Llama 3.1 405B Instruct model. It uses an optimized Transformer architecture and is an autoregressive language model. This model can be used for commercial or non-commercial purposes.
Large Language Model
Transformers

L
nvidia
10.91k
11
Bloom Deepspeed Inference Fp16
Openrail
BLOOM is an open-source multilingual large language model developed by the BigScience project, designed to provide efficient text generation capabilities.
Large Language Model
Transformers

B
microsoft
99
12
Featured Recommended AI Models